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problem, see e.g. [17], [6], [51]. In all of these applications, it is crucial to put a bound on 
the degree of the interpolant so that the controller, filter, etc. has limited complexity. As is well 
known, while the Nevanlinna-Pick theory features a simple criterion in terms of the Pick matrix 
for the existence of solutions and beautiful iterative techniques (Schur-type algorithms) to produce 
solutions when they exist, the degree specification on the interpolant is much harder to capture in 
this framework. The overcoming of this difficulty by the Byrnes, Georgiou, Lindquist school has 
open the way to several new apphcations in speech processing, bioengineering and robust control, 
see [5], [43], [33]. Notice that [26], [3], [28] deal with the more difficult multidimensional case. 

One of the central steps, in these authors approach, is the formulation of a convex optimiza- 
tion problem that includes as a (very) special case maximum entropy problems. The smooth 
parametrization of the complete class of interpolants occurs in the optimization setting, where it 
is crucial the dependence of the criterion on certain "a priori" parameters, cf. e.g. Remark 3.2 
below. It should be observed that, as in all of the previously mentioned applications, the primal 
problem is infinite dimensional, while the dual problem is finite dimensional. Hence, it is natural 
to seek the (unique) solution to the primal problem via duality theory. 

In [32], Georgiou and Lindquist, have applied this convex optimization approach to constrained 
spectrum approximation. The basic ingredients of the optimization problem are the following: 
An a priori power spectral density 'i> is given. Then new data become available in the form of 
asymptotic state-covariance statistics for a bank of filters. The latter induces a linear constraint 
on the family of spectral densities. It is then necessary to find a spectrum $ that satisfies the 
constraint and is as close as possible to \& in a prescribed metric. 

In [32], a KuUback-Leibler criterion was employed, where minimization is performed with 
respect to the second argument. This unusual choice was dictated by two considerations: i) The 
desire to have maximum entropy as a special case (^ = /); ii) the simple form of the optimal 
solution belonging to a parametric family of "rational" densities. The latter class, as well as 
another parametric class of "exponential type" [27, p.3], were recognized from the start [9], 
[7] to be critical points of logarithmic entiopy-like functionals. In [27] and in [28], homotopy 
like methods were proposed as an effective tool to solve a class of scalar and multidimensional 
generalized moment problems. 

In this paper, we seek to investigate constiained approximation of spectral density functions 
in a different metric, also originating in mathematical statistics, namely the HelUnger distance 
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[15], [39], [36], [40]. Observing that this distance amounts to the minimum distance between 
spectral factors of the two spectra, we are led to a natural extension of this metric to multivariable 
spectra (Theorem 6.1). We then show that the latter observation opens the way to attack the 
general multivariable case where, differently from the KuUback-Leibler case, an explicit form 
for the optimal solution may be obtained, see Theorem 7.2 . We also establish an existence 
theorem for the dual problem (Theorem 7.6) that paralles a correspondig fundamental result due 
to Byrnes and Lindquist [12]. We finally investigate iterative numerical methods to solve the 
dual problem. Although the dual problem is an unconstrained convex finite dimensional problem, 
the numerics is nontrivial. As observed in [3, Section VI], [32, Section VII], the dual functional 
has unbounded gradient at the boundary. Reformulation of the problem to avoid this difficulty 
may lead to loss of global convexity, requiring to initialize any descent method close to the 
minimum [6], [18], [42], [3]. As in [45] for the KuUback-Leibler case, we prefer to employ 
matricial descent methods. A number of nontrivial difficulties in the algorithms are overcome by 
resorting to ideas and results from spectial factorization theory. In our simulation, these iterative 
schemes (particularly a Newton type method with backtiacking line search) appear to perform 
effectively and reliably. 

The paper is outlined as foUows. Section II is devoted to the formulation of a generalized 
moment problem in the sense of Bymes-Georgiou-Lindquist, and to the corresponding existence 
question. In Section 111, two approximation problems for scalar spectral densities are introduced. 
The first employs a KuUback-Leibler type criterion, the second features the Hellinger distance. 
Optimality conditions for these two problems are presented in Section FV. The multivariable 
version of the two approximation problems, and the corresponding difficulties in the variational 
analysis, are discussed in Section V. Section VI is devoted to the intioduction of a new metric on 
multivariable spectial densities induced by the corresponding spectial factors. The multivariable 
spectrum approximation problem with respect to the distance of Section VI is solved in Section 
VII. Finally, Section VIII deals with the numerical solution of the dual problem. 

II. A GENERALIZED MOMENT PROBLEM 

We consider tiie following basic set-up patterned after [26], [32], [28]. Let <S!f'''™(T) be die 
family of bounded, coercive, C'^^'^-valued spectial density functions on the unit circle. Thus, a 
measurable, bounded matrix valued function $ belongs to 5!^^'"(T) if it satisfies the foUowing 
properties: 
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• the values of $ are m x m, Hermitian, non negative definite matrices; 

• there exists a positive constant c$ such that $(e*^) — c$/ is positive definite a.e. on T. 
Notice that $ G 5!^^'"(T) if and only if G S'^''"'(T). Let ^ G 5™^'"(T) represent an a 
priori estimate of the spectrum of an underlying zero-mean, wide-sense stationary m-dimensional 
stochastic process {y(n),n G Z}. We consider a rational transfer function 

G{z) = {zl - A)-^B, A G C"^", B G C"^'", (1) 

where A has all its eigenvalues in the open unit disc, B is full column rank, and {A, B) is 
a reachable pair. Here G models a bank of filters. We consider the situation where new data 
become available in the form of an asymptotic estimate S > of the state covariance of the 
system with transfer function G and input the unknown process y. In other words, we suppose 
we can estimate the covariance of the n- dimensional stationary process {xk\ G Z} satisfying 

Xk+\ = Axk + Byk, k eZ. (2) 

In general, 'i> is not consistent with E, and it is necessary to find $ in 5!|"^'"(T) that is closest 
to W in a suitable sense among spectra consistent with E, namely satisfying 

Jg^G* = ^, (3) 

where star denotes transposition plus conjugation. Here, and throughout the paper, integration 
takes place on [— tt, tt] with respect to the normalized Lebesgue measure d9/2n. The question 
of existence of $ G S^^^{T) satisfying (3) and, when existence is granted, the parametrization 
of all solutions to (3), may be viewed as a generalized moment problem. For instance, in the 
case m = 1, take G{z) with fc-th component Gk{z) = z^~'^~^. Take moreover 
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Here := E{y{n)y{n + k)}. This is the covariance extension problem, where the information 
available on the process y is the finite sequence of covariance lags Q), Ci, . . . , c„_i. It is known 
that the set of densities consistent with the data is nonempty if S > and contains infinitely 
many elements if E > [34], see also [22], [9], [10], [23]. 
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Existence of $ G S^^'^iT) satisfying constraint (3) is a nontrivial issue. It was shown in 
[24], [25] that such family is nonempty if and only if there exists H G c™^" such that 

E - AllA* = BH + H*B*, (5) 

or, equivalently, the following rank condition holds 

/ s - ai:a* b \ 

rank I = 2m. (6) 

y B* J 

We wish to give an alternative formulation of this existence result. First of all, notice that 
W.L.O.G. we can take E = /. Indeed, if S / /, it suffices to replace G with G' := Y.-^'^G and 
{A, B) with {A' = B' = Y'^/'^B). Thus constraint (3) from now on reads 

J G^G* = I. (7) 

Let Ub = B{B*B)~^B* denote the orthogonal projection onto Range(5). 

Proposition 2.1: A necessary and sufficient condition for the existence of spectra in ^^^^'"(t) 
satisfying (7) is that the following relation holds 

{I - Ub) {I - AA*) {I - Ub) = 0. (8) 

When (8) is satisfied, there exists $ G «S!^^'"(T) satisfying (7) of McMillan degree less than or 
equal to 2n. 

Proof: Necessity: Suppose there exists y m-dimensional, wide-sense stationary with spectral 
density $ G S!^^'^{T) satisfying (7). Let x be defined by (2). Taking covariances on both sides 
of (2), we get 

I = AA*+ AE{xkyl}B* + BE{ykxl}A* + BE{ykyl}B*. 
The latter relation implies (8). 

Sufficiency: We adapt the argument in [26, p. 18 14]. For a given purely non deterministic m- 
dimensional process y with spectrum $, define the process w as the output of the linear stable 



system 

Xk+i = Axk + Byk, (9) 

Wk = iB*B)-'B*Xk+i (10) 
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Inverting the system (9)-(10), we get 



Xk+i = {I - liB)Axk + Bwk, 



(11) 



{B*B)-^B*Axu-i+wu. 



(12) 



Write (8) as a Lyapunov identity 



i = {i-iiB)AA* (/-nB) + nB. 



(13) 



Since {A, B) is controllable, so is the pair ((/ - Hs)^, B{B*B)-^/^). It now follows from (13) 
that (/ — n^)^ has all eigenvalues in the open unit disc D. Thus, system (11)-(12) is stable, 
{B*B)~^B*G{z) is minimum phase and the processes y and w are causally equivalent. It follows 
that if we choose to be a white noise sequence with intensity E{wkwl} = {B*B)~^ and y 
to be defined by (11)-(12) then: i) (11)-(12) is the innovation representation of y; ii) the state 
covariance of the steady-state Kalman filter (ri)-(12) satisfies the Lyapunov equation (13) and 
is therefore the identity; iii) the spectral density of y is given by 



is the transfer function of (ri)-(12). We conclude that if we feed G in (9) with such a process 
y, the filter state x will have the required covariance, namely the identity matrix, and (7) will be 
satisfied. Moreover, $y is rational of McMillan degree at most 2n and it belongs to 5!|"^™(T) 
since its values and the values of <l>~^ on T are positive definite matrices. The spectrum (14) has 
been shown in [26, Section HI] to be the maximum entropy spectrum among those satisfying 
(7). This is there accomplished in a clever way, by relating the constiained maximum entiopy 
problem to a special one-step-ahead prediction problem. 

III. Constrained spectrum approximation: The scalar case 

A. Kullback-Leibler criterion 

In [32], the Kullback-Leibler measure of distance for spectia in 5+(T) := 5^^^(T) was 
introduced: 



% = W{z){B*B)-^W{z) 



* 



(14) 



where 



W{z) = 1- {B*B)-^B*A {zl -{I- nB)A)-^ B, 
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As is well known, this pseudo-distance originates in hypothesis testing, where it represents the 
mean information for observation for discrimination of an underlying probability density from 
another [38, p.6]. It also plays a central role in information theory, identification, stochastic 
processes, etc., see e.g. [37], [16], [14], [21], [2], [13], [49], [46] and references therein. It is 
also known in these fields as divergence, relative entropy, information distance, etc. If 



we have ©(^^H^) > 0. The choice of ©(^^H^) as a distance measure, even for spectra that have 
different zeroth moment, is discussed in [32, Section III]. It is observed there that the constraint 
(3) often fixes the zeroth Fourier coefficient of feasible spectra (this happens for sure when A is 
singular). In that case, rescaUng we are guaranteed that the index is nonnegative and equal to 
zero if and only if the two spectra are equal. T. Georgiou has kindly informed us [29], that even 
when A is nonsingular, under a rather mild assumption, it is possible to modify the index so that 
all $ satisfying the constraint have the same zeroth moment. In any case, the method entails a 
rescaling of the a priori density so that the optimization problem amounts to approximating 
the "shape" of the a priori spectrum. This is of course sensible to pursue in several engineering 
appUcations such as speech processing. 

We mention, for the benefit of the reader, that, in the same spirit, Georgiou has very recently 
investigated other distances for power spectra, [30], [31]. Motivated by classical prediction theory, 
where the optimal one step ahead predictor does not depend on the norm of the spectrum, 
he seeks natural distances between rays of spectral densities. Considering the degradation of 
performance when an optimal predictor for one stochastic process is employed to predict a 
different stochastic process, he is naturally led to introduce a certain metric on rays. 

As observed in the introduction, notice that minimizing D(\l/||$) rather than D($||\l/) is 
unusual with respect to the statistics-probability-information theory world. Besides leading to a 
more tractable form of the optimal solution, however, it also includes as special case (\I/ = 1) 
maximization of entropy [26]. In [32], the following problem is considered: 

Problem 3.1: (Approximation problem 1) Given \I/ G S{J), find ^kl tiiat solves 




minimize D(^||$) 



(15) 



over 




(16) 
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Remark 3.2: In the context of the covariance extension problem (4), the minimizers of (3.1), 
when ranges over positive trigonometric polynomials of degree n, are precisely the coercive 
spectra consistent with the first n covariance lags and of degree at most 2n, [9], [10], [22], [23]. 
This illustrates the role of the "a priori parameter" \& in obtaining a description of all solution 
to the moment problem of prescribed complexity. 

B. Hellinger criterion 

In this paper, we consider a different metric on spectral density functions. Given $ and \& in 
5(T), the Hellinger distance is defined by 



, , 1/2 



2 C^^ 

2^ 



It is a bona fide distance on S{T). Moreover, it satisfies the following properties. 
Proposition 3.3: Consider $, ^' G S{T). Then 



1) M*,*) < Vl|$||i + ||*||i; 

2) d^($,*)2<||$-*||i; 

3) ||$-*||i< (vW+^/Wk)c^i^(^,*)• 
These extend well-known properties of the Hellinger distance in the case of probability density 

functions. The straightforward proof may be found in [20]. 



Remark. On a finite-dimensional statistical manifold, endowed with the Fisher information 
as the metric tensor, both the HelHnger distance and the KuUback-Leibler pseudo-distance can 
be viewed as instances of the broader concept of a-divergences between two points, which 
arise from the so-called Amari connections. In particular, the 0-divergence, which indeed is the 

Hellinger distance, arises from the Levi-Civita connection. See [1, p. 66 and following]. 
We consider the following approximation problem. 

Problem 3.4: (Approximation problem 2) Given \I/ G S{T), find that solves 

minimize o?^($, (17) 
over 1$ G S{T) \ f G^G* = l\ . (18) 
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IV. Optimality conditions 

A. Kullback-Leibler approximation 

Consider first Problem 3.1. The variational analysis in [32] is outlined as follows (see also 
[45]). For A G Hermitian satisfying G*AG > on all of T, consider the Lagrangian 

junction 

L($,A) = D(*||$)+tr I^A^^yC^G*-/^^ 

= D(*||$) + y"G'*AG'$-tr(A), (19) 

where "tr " denotes the trace operator. Consider the unconstrained minimization of the strictly 
convex functional L(<I>,A): 

minimize{L($, A)|$ G 5(T)} (20) 

This is a convex optimization problem. The variational analysis yields the following result. 
Theorem 4.1: The unique solution ^kl to problem (20) is given by 

Moreover, suppose A = A* is such that 



G*AG > 0, e T, (22) 

/ G-^G* = I. (23) 
J G*AG 

Then ^kl given by 

^KL = —J- (24) 
G*AG 

is the unique solution of the approximation Problem (3.1). 

Thus, the original Problem 3.1 is now reduced to finding A satisfying (22)-(23). This is accom- 
plished via duality theory. Consider the dual functional 

A^inf{L($,A)|$ e5(T)}. 

For A satisfying (22), the dual functional takes the form 

A^^(^;^,A) = I *logG*AG-tr(A) + J ^. (25) 

Consider now the maximization of the dual functional (25) over the set 

jC^^ ■- {A = A*\G*AG > 0, e T}. (26) 
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Let, as in [32], 

J* (A) := - y * log G*^G + tr (A) 

The dual problem is then equivalent to 

minimize {J^{A.)\A e jC^^}. (27) 

The dual problem is also a convex optimization problem. In [32], A is further restricted to belong 
to the range of the operator F defined on the set Ch{T) of Hermitian- valued continuos functions 
defined in T, by 

r($) = J $ G Ch{T). (28) 

As mentioned in Section n (equation (5)), 

Range(r) = {E = E* : 3H e C"^" s.t. S - ylEA* = BH + H*B*, } (29) 

The problem then becomes 

minimize {J^(A)|A G £^^}, = Z:^^ n Range (F). (30) 

The reason is that the orthogonal complement of Range(r) is given by 

Range(r)^ = {M = M*|G*MG = OonT}. (31) 

This follows from the fact that M e Range(r)^ iff G Ch(T) 

= ti{J G^G*M) = J G*MG^. 

The dual functional is shown in [32] to be strictly convex on the restricted domain Cy^. It is 
also shown in [12] that J^r has a unique minimum point in C^^. This result implies that, under 
assumption (6), there exists a (unique) A in C^^ satisfying (23). Such a A then provides the 
optimal solution of the primal problem (15)-(16) via (24). 

B. Hellinger approximation 

The variational analysis for Problem 3.4 is very similar, see [20] for details. We state without 
proof the following result: It will be proven in Section VII in the (more general) multivariable 
case. 
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Theorem 4.2: Assume that Problem 3.4 is feasible, namely that condition (6) (or, equivalently, 
condition (8)) is satisfied. Then there exists A = A* G C"^" such that 

(1 + G*AGy 

In this case. Problem 3.4 admits a unique solution which is given by 



1 + G*AG > 0, G T, J G ^^ G* = 1- (32) 



= ^ (33) 

(1 + G*AG)2 

Remark 4.3: Suppose the a priori density ^ is rational. Then, the solution in (33) has in 
general degree 2n higher than the solution in (24). 

V. Constrained spectrum approximation: The multivariate case 

A. Kullback-Leibler approximation 

Multivariable Kullback-Leibler approximation has been investigated in [26], [28], whereas 
[3] deals with the multivariate Nevanliima-Pick problem. In statistical quantum mechanics, the 
state of an n-level system is represented by a density matrix p, namely a Hermitian, positive- 
semidefinite matrix in C"^" with unit trace [48]. The convex set of density matrices has as 
extreme points the one dimensional projections. The latter can be identified with the pure states 
of the system where ■0 is a unit vector in C", via p — {i/j, ■)'ip. Quantum analogues of entropy- 
like functionals have been considered since the early days of quantum mechanics [50]. Recently, 
renewed interest has originated in Quantum Liformation applications [44]. The quantum relative 
entropy between two density matrices is defined by: 

B{p\\a) := tr {p{\ogp- log a)). (34) 

Klein's inequality yields that D(p||(7) > if and only if p = a. Moreover, as in the classical 
case, the quantum relative entropy is jointly convex in its arguments. We are then led to the 
following definition: Given $ and W in S^^'^{T), the relative entropy D(\&||$) is given by 

D(*| 1$) = y tr (^(log * - log $)) . (35) 

First of all, we need to worry about nonnegativity of D(\&||$) and whether it is zero iff = 
Proposition 5.1: Lef $, * G «S+ """(T). Define *i = ^'/tr * and $i = $/tr$. Then 

D(*||$) = D(tr*||tr$) + ^(tr *)tr (*i(log*i - log$i)) . (36) 
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It follows that when /tr* = /tr$, then D(\l/||$) > 0. Moreover, B){^\\<^) = if and only if 
the two spectra coincide. 
Proof: 

D(*||$) = tr y^*(log*-log$) 

= tr / tr(W)*i (logtr - logtr ($)$i) 



tr j tr(^')^'i((logtr(^'))/ + log^'i - (logtr ($))/- log$i) 
tr y tr(^')^'i (log^^i - log$i) +tr y tr ((logtr (^f)) - (logtr ($))) 

j tr (*)tr (*i (log *i - log $i)) + j tr (*i)tr W log 



1 

tr* 



tr<l> 

= j tr(^')tr (^'i (log^'i -log$i)) +D(tr*||tr$). 
Since tr\l/i(e'^) = tr$i(e'^) = 1,V6' G [-tt, tt], it follows from Klein's inequality that 

tr *i(e^^) (log*i(e^^) - log$i(e^^)) > 0, V^. 

The latter imphes that 

j (tr *)tr (*i (log^i - log$i)) > 0. 

When /tr^f = /tr$, we also have D(tr^'||tr$) > 0. Thus, when Jtr^' = /tr$, 0(^^11$) 
is the sum of two nonnegative terms and the conclusion follows. □ 

Consider again Problem 3.1. 
Problem 5.2: (Approximation problem 1) For G 5!f''''"(T) 

minimize D(*||$) (37) 

over 1$ G 5!;^^'"(T) | j = / j , (38) 

where D(\&||$) is defined by (35). As in the scalar case, an a posteriori rescaling of the prior 
density is in general necessary. In the light of Proposition 5.1, if $ is the solution of (5.2), the 
new prior is 

* = 4 

/tr^' 
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For A G Qi^xi^ Hermitian such that G*AG is positive definite on all of T, define again the 
Lagmngian 

L($,A) = D(*||$)+tr I^A^^^G^G*-/^^ 

= D(*||$) + yG'*AG$-tr(A). (39) 

The following step, entaihng the unconstrained minimization of the strictly convex functional 
L($,A) on e (S+ is a stumbfing block. The optimality condition reads [28, Section 
IV] 

poo 

/ {^KL+rI)^{^KL + rI)dT = G*AG. (40) 
Jo 

In general, an expficit expression for ^kl in terms of \1/ and A cannot be obtained, and the 
variational analysis ends here. We mention that the minimization with respect to the first argument 
of the relative entropy can instead be carried out explicitly, leading to a solution of the exponential 
form 

$° = cexp(log*-G*AG), 

see [28, Section IV]. Homotopy like methods are described in [28] to find A, when it exists, 
such that $° satisfies the constiaint. 

B. Hellinger approximation 

Recall that, for a positive semidefinite Hermitian matrix M, M^l"^ is the square root of M, 
namely the unique Hermitian matrix whose square is M. If y is a unitary matrix that diagonalizes 
M so that M = y*diag (af , . . . , o?^V , tiien simply M^/^ = y*diag (ai, . . . , a^V . Motivated 
by the analogy with the KuUback-Leibler case, and by the scalar case, we define the Hellinger 
distance for $ and * in 5+ to be 

4($, := /" tr - (41) 

J-i, 27r 

Notice that (41) appears also as the natural generalization of the Hellinger distance for density 
operators of statistical quantum physics introduced in [41]. Consider again the strictly convex 
Problem 3.4: 

minimize (42) 
over 1$ G 5!^'><'^(T) | ^ G^G* = /| , (43) 
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where <i|^($, ^) is now given by (41). Define jC^ by 

^ C'^x^lA = A*, / + G*AG > a.e.onT}. (44) 

For A G consider the Lagrangian 

L^($,A) =4($,*) +tr (^A^^y G^G*-/^^. 

The unconstrained minimization of the strictly convex functional over $ G 5!^^™(T), 
however, leads to an optimality condition (expressing the the unique optimum in terms 
of and A) which does not appear to be useful. 

To obtain such optimality condition, we first need an expression for the directional derivative 
of the matrix square root. More precisely, given P = P* > let S{P) := P^/^ and 5P = 5P*: 
We want to compute 

Employing the chain rule, it is easy to see that 

6S{P, 6P)P^/'^ + P^/'^6S{P, SP) = 5P 

so that 

5S{P,5P)= / exp(-pi/2i)(5Pexp(-pi/2i)di. (45) 
Jo 

Taking (45) into account, we get the optimality condition 

poo ^1 

J \exp{-$]ih) - j exp{-$]ih)^ dt + -G*AG = 0. (46) 
The integral in (46) is the unique solution of the Lyapunov equation 

d^/^x + a:I>^/2 = l>^/^ - (47) 

Equations (46)-(47) now yield 

_ 1 1,1/2 AG) - ^(G*AG)$^/2 ^ 1,1/2 _ ^1/2^ 

which in turn gives 

l>^/^ (/ + G*AG) + (/ + G*AG) l>^/^ = 2*1/2. ^^^^ 
Since / + G*AG > almost everywhere on T, we finally get 

/•oo 

$1/2 = 2 / exp [-(/ + G*AG)i] ^^/^ exp [-(/ + G*AG)i] dt. (49) 
Jo 
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The maximization of the dual junctional A A), however, appears quite problematic. 

We show in the next section that, differently from the KuUback-Leibler case, it is possible 
to define a sensible Hellinger distance for matricial functions that leads to a full unraveling 
of the complexity of the optimization problem. This will be accomplished by connecting this 
problem to a most classical topic at the hearth of systems and control theory, namely the spectral 
factorization problem. 

VI. Hellinger distance and spectral factorization 

Let F be a measurable function defined on the unit circle T and taking values in C"^^. Then 
F belongs to the Hilbert space Li^^^ if it satisfies 

j irFF*de < oo. 

For G in Lf^y^^, the scalar product is defined by 

{F,G)2 = J ti FG*dd, 

so that \\F\\l = / tr FF*de. Let $ G ^^^'"(T). Then a measurable C^^^-valued function W 
is called a spectral factor of $ if it satisfies 

W{e'^)W{e'^Y = $(e'^), a.e. on T. 

Notice that necessarily p > m and W{e^^) is a.e. full row rank. Moreover, W is bounded on 
T, and therefore it belongs to L'^^p- When p = m, is also bounded and, consequently, 
W-^ G ^^xm- Any $ G 5!f''''"(T) satisfies the Szego condition 

/ logdet$(e^^)^ > -oo, 

and admits therefore spectral factors W in -ff^xm' namely the Hardy space of functions in i^^xm 
that posses an analytic extension in \z\ > 1, see e.g. [47], [35]. 

Let Wi and W2 be spectral factors of the same $ G 5+ ^""(T) with Wi square. Then trivially 
U := Wi^W2 is a m X p all-pass function, i.e 

U{e^)U{e'^y = I, G T. 

For $, G «S+ """(T), consider the following function 

dH{^,^)=[mi{\\W^-W^\\l: W^,W^eLl^^, W^W; = ^, W^W; = <P}Y^\ (50) 
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Theorem 6.1: The following facts hold true: 

1) For any square spectral factor we have: 

dH{<^,^)=[mi{\\W^-W^\\l: W^eLl^^, W^Wi = <^>}Y^\ (51) 

2) The infimum in the above equation is a minimum: Indeed the unique spectral factor of $ 
minimizing (51) is given by 

:= ($V2^$l/2)-V2^l/2^^_ 

3) dn is a bona fide distance function. 

4) dn coincides with the Hellinger distance in the scalar case. 
Proof: 

1) First of all, observe that, once fixed the spectral factor W^, any square spectral factor of 
\1/ can be written as = W<j,U, where U is a mx m all-pass. Hence, 

J tr {w^ - w^){w^ - w^)*de = y tr {w^ - w^u*){w^ - w^uydo. 

Observe, moreover, that W<j>U* is a square spectral factor of $, so that (51) holds. 

2) To show that the infimum in (51) is a minimum, notice that (51) may be rewritten in the form 

dni^, " {/ ~ $^''V)(ty* - ^^/^V)*d9 : V e L^xm. W* = ^| • (52) 

We shall solve this problem by unconstrained minimization of the Lagrangian 

L = Jtr [{W^ - $i/V)(l?* - $V2y)* + A(yy* - /)], 
where A = A* > 0. The first variation of the Lagrangian (at V in direction dV G L'^xm) 

dL{V- dV) = jtr [{AV - ^^/^W^)dV* + dV{AV - ^^/^W*)*] 
The second variation of the Lagrangian is 

d'^L{V]SV) = 2 J tr [$1/^51/*$^/^ + Ai/W5y*Ai/^]. 

Hence, L is strictiy convex and therefore V is a minimizer of the unconstiained minimization 
problem if and only if: 

5L{V; 5V) = 0, V 5V. (53) 
Condition (53) is clearly equivalent to AV — = or to 

V = A-^^^/'^W^. 
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Thus, if there exists A = A* > such that 



VV* = A~^$^/^^'$^/^A~^ = /, 



then V minimizes (52). Such a A is readily seen to be given by 



A= [$V2^$l/2]l/2_ 



In conclusion, the infimum in (51) is a minimum and 



is the unique minimizer. 

3) The distance properties of dn are easy to check: (i) Symmetry is an immediate consequence 
of the definition of dn- (ii) It is clear that (4r($, $) = 0. Conversely, if dni^, ^) = 0, then $ 
and W share a common spectral factor and are, therefore, the same spectral density, (iii) The 
triangular inequality is inherited by the definition of dn as the infimum of the L2 distance among 
spectral factors. Thus, given \& and T and chosen an arbitrary square spectral factor Wr of 
T, we have 

(i^($,*)= inf \\W^-W^\\2< inf - V^^rlb + ||VI/* - Wrlb] = 

= inf - W^t||2 + inf - Wrh = ^if($,T) + Jif(*,T) 

W<i, Will 

where the last equality is a consequence of point 1). 

4) By choosing = \&^/^, it is immediate to check that in the scalar case (m = 1) = 1 and 
hence du coincides with the Hellinger distance du- □ 

VII. ^H-OPTIMAL MULTIVARIABLE SPECTRUM APPROXIMATION 

Theorem 6.1 shows that du is a natural extension to the multivariable case of the Hellinger 
distance. The corresponding multivariable version of Problem 3.4 is the following: 



It is in this form that the optimization problem is amenable to the variational analysis even in 
multivariable version. Let 



Problem 7.1: Given W G 5!^'^'"(T), find $ G 5!^'^'"(T) that solves 



minimize 



(54) 




(55) 



{A e C"^"|A = A*, / + G*kG > Ve^^ G T} 



(56) 



February 8, 2007 



DRAFT 



DRAFT 



18 



and 

jC^ := jC^ n Range(r) (57) 

where F was defined in (28). The following is our main result. 

Theorem 7.2: Assume condition (6) (or, equivalently, condition (8)) is satisfied. Then there 
exists a unique A e such that 

J G{I + G*AG)-^^{I + G*AG)-'^G* = I. (58) 

The unique solution of the constrained approximation Problem 7.1 is then given by 

:= (/ + G*kG)-^^{I + G'*AG)-^ (59) 
We break the proof of this theorem into two parts: First, by unconstrained minimization of the 
Lagrangian function, we obtain an expression for a spectral factor of the optimal $ depending 
on the Lagrange multiplier matrix A (Lemma7.3). Second, we establish existence of a unique 
A G /:f satisfying (58) (Theorem7.6). 

For A G a spectral factor of and VK, W"^ G L^'^iT), form the Lagrangian 

function: 

L{W,K) = tr j - W*) {W - W^y + tr A GWW*G* - I 
Consider the unconstrained minimization problem: 



(60) 



min{L(Vl/,A)|M/,H"-^ G L;^^'"(T)} (61) 

Lemma 7.3: The unique solution to problem (61) is given by 

W ^(I + G*AG)-^W^. (62) 
Proof: Let dW G L™^'"(T). The first variation of the Lagrangian is: 

SL{W,A;SW) = y" [6W{W -W^y + {W -W^)5W* + A{G5WW*G* + GW5W*G*)] 

= tr J {W-W^ + G*AGW) 6W* + (^ti J {W-W^ + G*AGW) 5W*^ 

By taking into account the cyclic property of the trace operator, the second variation of the 
Lagrangian is easily seen to be given by 

5^L{W, A; 5W) = 2tr J dW* {I + G*AG) SW (63) 
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which is clearly positive for any A G and 5W ^ 0. Hence L is strictly convex with respect 
to W. Moreover, the set is open and convex. To find the minimum point of L, we impose 
5L{W, A; 5W) = in each direction 5W. This yields (62). □ 

We now consider the question of existence of a matrix A G satisfying (58). To this end, 
we introduce the dual functional 

L{W, A) = ti J ((/ + G*AG)-^W^ - W^) ((/ + G*AG)-^W^ - W^)* 

+tr a(^J G{I + G^AGY'-W^W^il + G^AGY'-G* - 

= tr j -^-{1 + G*AG)-i* - tr A, AG . (64) 
Instead of maximizing (64), we consider the equivalent problem of minimizing the functional: 

J*(A) := -L(iy,A) +tr = tr y"(/ + G*AG)-^^+trA, A G (65) 

Lemma 7.4: The functional (65) is convex and its restriction to £f? (defined in (57)) is strictly 
convex. 

Proof: First of aU, observe that is an open, convex subset of the Hermitian matrices in 
£nxn^ For 6A G C"^" Hermitian, we compute the directional derivative: 

dJ^{A;dA) = -tr j {I + G*AG)-^G*SAG{I + G*AG)-^'i! + trSA 



[l- j G{I + G*AG)-^*(/ + G*AG)-^G*^ 6 A 



(66) 



= tr 

The second variation is then given by 

5^J^{A; SA) = 2tr J W;,{I + G*AG)-^G*6AG{I + G*AG)-^G*SAG{I + G*AG)-^W^ (67) 

which is clearly a non negative quantity. Hence J§ is convex on jC^. In view of (31), we have 
that 5^J^(A;5A) is strictly positive for any non-zero 6 A G Range(r), and consequently is 
strictly convex on jC^. □ 

As an immediate consequence of the above lemma, we have the following corollary. 
Corollary 7.5: The dual problem 

Find A G £f minizing J* (A) (68) 

admits at most one solution. Moreover, (58) is necessary and sufficient for A to solve the dual 
problem (68). 
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We now tackle the existence issue for the dual problem. Although this is a finite-dimensional, 
convex optimization problem, the existence question is quite delicate since the set jC^^ is open 
and unbounded. The proof of the following theorem is partially inspired by the proof of the 
corresponding result for the scalar, KuUback-Leibler case in [Section 2] [12]. 

Theorem 7.6: If Problem 7.1 is feasible, i.e. (6) (or, equivalently, condition (8)) is satisfied, 
then the dual functional (65) has a unique minimum point in C^- 

Proof: In view of Corollary 7.5, we only need to show that takes a minimum value on 
Cy . First, we observe that J* is continuous on its domain. Second, we show that is bounded 
below on £f . Indeed, by feasibihty, there exists a $ G 5+ """(T) such that / G^G* = /. Hence, 
for all M G C"^", / G$G*M = M, which implies 

tiM = ti J ¥/^G*MG¥/'^. (69) 

RecaUing that, for A G jC^, I + G%e''^)KG{e''^) is positive definite for all d G [0, 27r), and using 
the monotonicity property of the trace, we get 

tr A = tr j $1/2(7* AG$i/2 > -tr ^ $, VA G . (70) 

Define / := -tr / $ < 0. We get 

J*(A) := tr j (/+(?* A(?)-i*+tr A = ii j ■^^''^{I+G*KG)-^^^^^+ti A > /, VA G , 

(71) 

where we have used tr / *V2(/ _^ G* AG) -1*^2 > q. on £f . 

Finally, we show that J^r is inf-compact i.e. the sub-level sets J^i(— oo,r] are compact. This 
implies existence of a minimum point. Indeed, observing that J^f(O) = tr J \&, we can then 
restrict the search for a minimum point to the compact set J^i(— 00, tr / \&]. Existence for the 
latter problem then follows from Weierstrass Theorem. To prove inf-compactness of J^r, we 
proceed to show that: 

1) 



lim J*(A) = +00, 
where dC^ denotes the boundary of ; 



2) 

lim J*(A) = +00. 
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To prove property 1), notice that dC^ is the set of A in Range(r) for which: (i) / + G*AG is 
positive semidefinite on T and (ii) 3'd s.t. / + G*{e^'^)AG{e^'^) is singular. Thus, for A dC^, 
all the eigenvalues of [/ + G*AG]~^ are positive on T and, at least one of them, has a pole 
tending to the unit circle ([/ + G*AG] and [/ + G*AG]~^ are rational matrix functions!). Since 
^ is fixed and coercive, then also ^^/^[/ + Ct*ACt]~^^^/^ has all eigenvalues positive on T and, 
at least one of them with a pole tending to the unit circle as A ^ OCy . The conclusion follows 
by rewriting J* as J^{A) = tr / ^'i/^jj ^ G'*AG]-^*^/2 ^ ^j. A, in view of (70). 

Point 2) is more deUcate. Let A^ G be a sequence such that limfc^oo ||Afc|| = oo. Let 
A^ := 11^. It is easy to see that if A G C", then aA G C" for all ct G [0, 1]. Hence, for 
sufficiently large k, we have A^ G ■ 

Let 77 = liminf tr A^. We want to show that 77 is strictly positive. We first observe that ?7 > 0. 
In fact, tr A^ = p^tr Afc > p^/ 0, where we have used (70). 

Consider a sub-sequence of A^ such that the limit of its trace is rj. Since this sub-sequence 
remains on the surface of the unit ball dB := {A = A* : ||A|| = 1}, which is compact, it has a 
sub-sub-sequence converging in dB. Let A^. be such a sub-sub-sequence, and let A°° G dB be 
its limit. Clearly, 

lim A^. =trA°° = 77. (72) 

We now prove that A°° G . To this aim, notice that A°° is the limit of a sequence in the 
finite dimensional finear space Range(r) and hence it belongs Range(r). It remains to show 
that (/ + G*K°°G) is positive definite on T. Indeed, since (the unnormalized sub-sequence) A^. 
belongs to we have that (/ + G*Afc.G) is positive definite on T so that (p^/ + G*A° G) is 
also positive definite on T. Hence, G*A^G is positive semi-definite in T so that (/ + G^A'^G) 
is strictly positive definite on T. This proves that A°° G • The latter, together with (69) yields 

trA°° = tr j ¥/'^G*N^G¥/'^. (73) 

As seen before, G*A°°G is positive semi-definite on T. Moreover, G*A°°G is not identically 
zero since A°° G Range(r) (see (31)), and A°° =^ (it is not the zero matrix) since A°° G dB . 
We conclude, in view of (72) and (73), that 77 = tr A°° > 0. 
Finally, we have 

MAk) = ti [ ^1/2(7 + G*AfcG)-i*i/2 + tr Afc > ||Afe||tr A°. (74) 
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Since ||Afc|| ^ oo and liminf trA^ > 0, we get 

lim J*(Afc) = +00. (75) 

□ 

Let A G be the unique solution of the dual problem whose existence has just been proven 
in Theorem 7.6. We show below that it also provides via (59) the unique solution to the primal 
problem 7.1. 

Proof of Theorem 7.2: Let Wvj/ be any spectral factor of ^f. Let W = {I + G*AG)-W* as in 
(62). Let W, belonging toL^^'"(T) together with its inverse, satisfy the constraint 

J GWW*G* = I. (16) 

By Lemma (7.3), and by the strict convexity of the functional L(-, A), we get 

\\W -W^\\l = L{W,A) <L{W,A) = 

Thus, W minimizes the L2 distance to Wxj, among W belonging to L™^™(T) together with their 
inverse and satisfying constraint (76). Theorem 6.1 now shows that = WW* (coinciding 
with $if in (59)), is the unique solution to the multivariate approximation Problem 7.1. □ 

Remark 7.7: Consider the important covariance extension problem when, as it is often the 
case, the process y is real-valued. Then A and B are real matrices and is a real spectral 
density, i.e. '^{z) is real (and symmetric) for all ^ G T. In this case, A is a real symmetric 
matrix. 

Vin. Numerical solution of the dual problem 

A. A matricial Newton-type algorithm 

We now show how to efficiently implement a modified Newton algorithm for the computation 
of A (convergence of the algorithm, however, will be discussed elsewhere). This task requires 
some care because we are working in a matricial space and vectorization does not appear to 
be convenient. The road-map is the following. We have to find the minimum of the functional 
(65) which is strictly convex on jC^. This is then equivalent to finding a matrix A G jC^ that 
annihilates the derivative of J^(A) i.e. such that (58) is satisfied. According to the abstract 
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version of the Newton algorithm, A may be found as the limit of the sequence obtained by 
iterating the following steps: 

1) Choose an initial estimate Aq ^ of ^ (^he simplest choice being Aq = 0). 

2) Let Aj be the current estimate of A. Compute the directional derivative <5J*(Ai; 6 A) at the 
point Ai in direction 5 A as in (66). 

3) Compute the "Hessian" (second directional derivative) 5^ J*(Aj; (5Ai, (5A2) at the point Ai 
in directions SAi and dA2. This may be done in the same way in which we computed 
(5^ J*(Aj; 6 A) in (67). Indeed, the latter may be viewed as the "diagonal" part of the Hessian 
i.e. 5'^J^{Ai;5A) = (5V*(Ai; M, (5A). We get 

(5V*(Ai;Mi,M2) = ti J GQi^ [{G*5A2GQj^^) + (G^MsGg,"'*)*] Qi^G*5Ai 

ill) 

where Qi := (/ + G*AiG). 

4) Solve for X G equation 

d'^MK; 5 A, X) = -6J^{Ai; 6 A) V 6 A (78) 

and set the {i + l)-th estimate of A to the value A^+i = Ai + X 

5) Let e be a suitably small number. If 

II J G{I + G*Ai+iG)-^^{I + G*Ai+iG)-^G* - I\\ > e (79) 

then go to step 2). Otherwise, set A = A^+i. 
There are some very delicate points to be addressed. First of all, we need to worry about 
existence of solutions for equation (78). 

Proposition 8.1: Assume condition (6) (or, equivalently, condition (8)) is satisfied. There exists 
a unique X G Range(r) solving Equation (78). 
Proof: Equation (78) may be rewritten as 

J GQ-^ [(G*XGQ-i*) + {G*XGQ-^^y] Q'^G* = J GQ'^'ifQ-^G* - I (80) 

where we have eliminated 6A. Notice that the map (p associating to X G Range(r) the matrix 

^{X) := j GQ-' [{G*XGQ-'^) + {G*XGQ-'^y] Qi'G* 

defines a linear transformation of Range(r) into itself. In fact, clearly 

Qi' [{G*XGQr'^) + {G^XGQr'^Y] Qr' = [Qr' [{G*XGQi'^) + {G*XGQi'^y] Qr']' 

(81) 
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SO that by definition ^p{X) G Range(r). The linear map has trivial kernel. In fact, if for some 

X G Range(r), ip{X) = then 

= ii[ip{X)X] =6^J^{Ai;X,X). 

Taking into account the positive definiteness of 5^ J^(Aj; X, X) on Range(r), this impUes X = 0. 
As a consequence, the image of f is the whole linear space Range(r). It only remains to ob- 
serve that [/ Cg-^^g-^G* -I] e Range(r). Indeed, / G Range(r) and / GQ-^'i'Q-^G* G 
Range(r) by definition of Range(r). □ 

Proposition 8.1 guarantees the existence of a solution of (78) (or, equivalently, of (80)) in 
Range(r) but not in requested. To overcome this problem, we resort to a variant of the 

Newton algorithm known as Newton method with back-tracking. In this variant, the following 
sub-steps are employed in place of step 4): 

4.1) Solve for X G Range(r) equation (78). 

4.2) Let k = 1 and double k until both conditions 

Ai + yx] G £f , (82) 



k 

\\Vi\\ < \\Vi,k\l (83) 

are satisfied, where 

Vi := J G{I + G*A^G)-^^{I + G*AiG)-^G* - I 

and 

Vi,k := J G{I + G*{A, + ^X)G)-^^{I + G*(A, + ^X)G)-'G* - I. 

4.3) Set tile {i + l)-th estimate of A to tiie value A^+i = Ai + ^X 

This procedure guarantees that each Aj G even when X C^. Notice that, by convexity 
of the problem X is a descent direction so that, for sufficiently large k, (83) is certainly satisfied. 
Moreover, since jC^ is an open set, (82) is also satisfied for for sufficiently large k. 

B. Computation of the solution of equation (80) 

The next point that needs to be addressed is the computation of X (step 4.1)). In fact, although 
(80) is a linear equation, it is not obvious how to solve it in a numerically efficient way. To 
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simplify notation, we drop the subscript "i" in and Qi := {I+G*AiG). Consider the following 
equation 

J GQ-^ [(G'XGQ-^-^) + [G^XGQ-^^Y] Q'^G* = j GQ'^^Q'^G* - I (84) 

We propose the following procedure. 

1) Choose a set {Xi} of Unearly independent matrices such that span{Xi} = Raiige(r). 

2) Compute the quantities 

Yi := j GQ-' [{G*XiGQ-^^) + {G^XiGQ-^^)*] Q-^G\ (85) 

Y := j GQ-^^Q-^G* - /. (86) 

3) Solve for the scalar unknowns yi equation 

J2y^Y, = Y. 

i 

4) Set 

X = Y,y,X,. 

i 

Steps 3) and 4) do not present any difficulty. Concerning point 1), employing the characterization 
(29) of Range(r), we simply have to solve the n x m Lyapunov equations 

S - A1:A* = BHh,k + Hl^B* (87) 

where Hh^k is the matrix in which the entry in position (/i, k) is 1 and all the other entries 
are zero. The n x m matrices obtained by solving (87) span Range(r), but are not linearly 
independent. It is easy, however, to employ the singular value decomposition algorithm (which 
is very stable and robust) and reduce to a basis {Xi} of Range(r). 

Concerning point 2), in the case when \& is a rational matrix function we can compute 
the integrals in (85) and (86) very efficiently and precisely. We first describe in details the 
computation of Y. To compute / GQ'^'iQ'^G* we observe that % := GQ'^'ilQ'^G* is a 
spectral density. Let be an analytic spectral factor of x (namely a function analytic 
outside the unit disk and such that x — W'x^x-'' Then, / x is the steady-state covariance of 
the output of a filter with transfer function fed by normalized white noise. To compute a 
realization of we implement the following steps: 
2a) Compute a co-analytic square spectral factor of \& (namely \& = W^W^, being 
square and analytic outside the unit disk). This requires (see, e.g., [19]): 



February 8, 2007 



DRAFT 



DRAFT 



26 



- To decompose as = Z + Z* with Z being analytic inside the unit disk: This may 
be done by partial fraction expansion. 

- To solve an Algebraic Riccati equation of dimension equal to the McMillan degree of 
Z. 

2b) Factorize Q — {I + G*AG) as Q — A* A, with A being square and analytic together with 
its inverse outside the unit disk. This can be done by computing the stabilizing solution 
Xs of the following algebraic Riccati equation: 

X = A*XA - A*XB{I + B*XB)-^B*XA + A (88) 

We have the following realization for A~^: 

A-^ = (/ + B*XsB)-^/^ - (/ + B*XsB)-^B*XsA{zI - Ts)~^B(I + B*XsB)-^/^ (89) 

with Ts := A - B{I + B*XsB)~^B*XsA having all its eigenvalues inside the unit circle. 
2c) Compute a realization of H* := A~*W^. Notice that H* is a co-analytic spectral factor 

of A-*-^A-K 

2d) From H*, compute an analytic spectral factor Hi of A~*\&A~^ using the procedure detailed 

in Lemma A.l in the Appendix. 
2e) Compute a minimal realization of := GAHi. 

= C^{zl - A^r'B^. 

We get 

j GQ-^^Q-'G* = j = G^^Gl, (90) 

with being the solution of the Lyapunov equation 

Sx - ^xSx^x = ^x^x- (91) 
For the computation of the integrals Yi in (85), we employ the same technique. The main 
difference is that the integrand of (85) is not a spectral density. Nevertheless, we observe that, 
by factoring Q as Q = A*A (exactly as we have done in point 2b) above) and by defining the 
functions $i,$2 G as 

$1 := A-*G*XGA, $2 := A-*^'A, (92) 

we may rewrite such an integrand in the form 

GA-*[$i$2 + $i$2]A-*G'* = G'A-*[($i + $2)(^i + ^2)* - - $2^2]^"*^'* (93) 
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It is therefore clear that the integrand of (85) is a difference of spectral densities. Hence, 
the integral (85) may be computed by resorting to the same technique detailed above for the 
computation of Y. 



C. Simulation results 



We have apphed the procedure described in Section VIII-A to many different examples and 
it performed very well even in the case of large values of n and m (recall that B G C"^"^). For 
example, for n = 12 and m = 6 the algorithm converged in less than 10 minutes in an Apple 
G4 IGHz computer. In the scalar case (m = 1) some examples are discussed in [20] for the case 
of the HelUnger distance and in [45] for the case of the KL pseudo-distance. In the following 
we discuss a simple multivariable example (n = 3, m = 2). Choose 





1/3 










1 







9/4 


12/5 





A = 





1/2 





B = 


1 


1 


S = 


12/5 


16/3 


16/7 










1/4 







1 







16/7 


32/15 



A, B and S satisfy the feasibility condition (6). After re-normalizing so that I! = /, we get: 



0.6191 -0.0471 
0.2697 0.2711 
-0.0458 0.6393 



0.2309 -0.3657 -0.2046 
0.1368 0.7935 0.2434 , 5~ 
-0.0886 -0.4086 0.0590 

Finally, we have chosen the reference spectral density ^' to be identically equal to / (the identity). 
In this case the Kullback-Leibler approximation has the interpretation of maximum entropy 
solution and may be obtained in closed form [26]: 



^KL = [G*B(B*B)-^B*G] ^ 



(94) 



For the computation of the Hellinger approximation, we have set in (79) e := exp(— 12). 
Our procedure converged in 12 steps of the Newotn algorithm with backtracking (less than 
10 seconds) to the matrix 

-0.4267 -0.0621 0.0005 
-0.0621 -0.1007 -0.0269 
0.0005 -0.0269 -0.4690 
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-0.1 



Fig. 1. First picture: Graphics of (e"') (bold line) and $2^}^' (e*'') (thin line) as functions of Second picture: Graphics 
of $2'^^(e*'') (bold Hne) and (e*'') (thin line) as functions of 'd. TWrd picture: Graphics of ^_^%''^\e^'^)] (bold line) 
and 3fi[4^f ^(e*'')] (thin Une) as functions of d. Fourth picture: Graphics of %^%''^\e^^)\ (bold Une) and 9[4'^£^(e*'')] (thin 
Une) as functions of 



Let be the corresponding Hellinger approximation computed as in (59). Let $55'^'^ be the 
entry in row i and column j of and similarly define ^'^l- Figure 1, 'l*^'^'', <l^'^'' and the 
real and imaginary parts of <l>}j' ^ are depicted together with the corresponding entries of ^^kl- 
It may be worthwhile to observe that, with respect to the Li distance (which induces a 
natural metric for spectral densities), HelUnger approximation does better than KuUback-Leibler 
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approximation in this example. More precisely, we have 



tr 



1/2 



~ 1.6982 



tr 



1/2 



2.5079 
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Appendix 
a spectral factorization result 

In this appendix we collect a side result on spectral factorization. 

Lemma A.l: Let A be a stability matrix and H{z) = C{zl — A)~^B + D be a minimal 
realization. Let P be the solution of the Lyapunov equation 



Let 



K 
J 



P - A*PA = CC*. 
be an ortho-normal basis o/ker [A*P-^/'^ C* ] i.e. 



(95) 



K 
J 



0, 



[K* J* 



K 
J 



(96) 



(97) 



[^*pl/2 (J* 

Let G := P-^'^K and 

Hi(z) := {D*C + B*PA)(zI - A)-^G + B*PG + D*J. 

Then, H*H = HiH^. 

Proof: Let Q := C{zl - A)-^G + J. We have 

Q*Q = G*{z-^I-A*)-^C*G{zI-A)-^G+G*{z-^I-A*)-^G*J+J*C{zI-A)-^G+rj (98) 

Now let P > be the solution of the Lyapunov equation (95). Then, 

C*C = -{z-^I - A*)P{zI -A) + (z-^I - A*)Pz + z-^P{zI - A) (99) 
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Substituting (99) into (98) we obtain 

Q*Q = -G*PG + G*Pz{zI -A)-'^G + G*{z-^I -A*)-h-^PG 

+G*{z-^I - A*)-^C*J + J*C{zI - A)-^G + J* J (100) 

Moreover, 

z{zl - A)-^I ^A{zl - A)-^ and {z'^ I - A*)'^ z''^ ^ I ^ {z'^ I - A*)'^ A* (101) 
so that 

g*Q = ( J*C + G*PA){zI - A)-^G + (( J*C + G*PA){zI - Ay^G)* + G*PG + J* J (102) 

Taking (96) into account, it is easy to see that Q*Q = I. Therefore, H*H = H*QQ*H. Recalling 
(99) and (101), we eventually get 

Q*H = {G*{z-^I-A*)-^C* + r){C{zI-A)-^B + D) 

= -G*{z-^I - A*)-\z-^I - A*)P{zI - A){zl - A)-^B 
+G*{z-^I - A*)-Hz-^I - A*)Pz{zI - A)-^B 
+G*{z-^I - A*)-h-^P{zI - A){zl - A)-^B 

+G*{z-^i - A*)-^c*D + rc{zi - A)-'^B + ro 

= -G*PB + G*Pz{zI - A)-^B + G*{z-^I - A*)-^z-^PB 

+G*{z-^i - A*)-^c*D + rc{zi - A)-^B + ro 

= -G*PB + G*P (/ + A{zl - A)-^) B + G*{I + (z'^I - A*)-^A*) PB 

+G*{z-^I - A*)-^C*D + J*C{zI - A)-^B + J*D 
= G*PB + G*PA{zI - A)-^B + G*{z-^I - A*)-^A*PB 

+G*{z-^i - A*)-^c*D + rc{zi - A)-^B + ro 

= G*(z-^I - A*)-\C*D + A*PB) + {G*PA + rc){zl - A)-^B + G*PB + ro 
= G*{z-^I-A*)-\C*D + A*PB) + G*PB + J*D = H* (103) 

□ 
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